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Abstract 

Background: Current HIV-1 envelope glycoprotein (Env) vaccines are unable to induce cross-reactive neutralizing 
antibodies. However, such antibodies are elicited in 10-30% of HIV-1 infected individuals, but it is unknown why 
these antibodies are induced in some individuals and not in others. We hypothesized that the Envs of early HIV-1 
variants in individuals who develop cross-reactive neutralizing activity (CrNA) might have unique characteristics that 
support the induction of CrNA. 

Results: We retrospectively generated and analyzed env sequences of early HIV-1 clonal variants from 31 individuals 
with diverse levels of CrNA 2-4 years post-seroconversion. These sequences revealed a number of Env signatures 
that coincided with CrNA development. These included a statistically shorter variable region 1 and a lower 
probability of glycosylation as implied by a high ratio of NXS versus NXT glycosylation motifs. Furthermore, lower 
probability of glycosylation at position 332, which is involved in the epitopes of many broadly reactive neutralizing 
antibodies, was associated with the induction of CrNA. Finally, Sequence Harmony identified a number of amino 
acid changes associated with the development of CrNA. These residues mapped to various Env subdomains, but in 
particular to the first and fourth variable region as well as the underlying a2 helix of the third constant region. 

Conclusions: These findings imply that the development of CrNA might depend on specific characteristics of early 
Env. Env signatures that correlate with the induction of CrNA might be relevant for the design of effective HIV-1 
vaccines. 

Keywords: Early HIV-1 variants, Envelope glycoprotein, Signatures, Cross-reactive neutralizing activity, Glycosylation, 
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Background subtypes is appealing. Unfortunately, this has shown to be 
The identification of the human immunodeficiency virus a difficult hurdle, and none of the vaccine candidates 
type-1 (HIV-1), as the causative agent of the acquired tested to date have been able to induce this type of 
immunodeficiency syndrome (AIDS), initiated a long- humoral immunity [2-4], The moderate efficacy against 
term but as of yet unsuccessful search for an effective acquisition of infection in the RV144 vaccine trial [5], in 
and safe HIV-1 vaccine. Ideally a vaccine should elicit both which VI V2 IgG antibodies correlated inversely with the 
humoral and cellular immunity [1]. In particular the in- rate of infection [6], has lead to optimism in the HIV-1 
duction of cross-reactive neutralizing activity (CrNA) that vaccine field. In this trial, only very low titer, tier 1 neutral- 
is capable of neutralizing HIV-1 variants from different izing antibodies (NAbs) were detected and efficacy may 

improve with a vaccine that is capable of eliciting a cross- 
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diates entry into host cells and is the sole target for 



© 2013 van den Kerkhof et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the 
BiolVlGCl C6ntTcll Creative Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, 
distribution, and reproduction in any medium, provided the original work is properly cited. 



van den Kerkhof et al. Retrovirology 2013, 10:102 
http://www.retrovirology.eom/content/1 0/1 /1 02 



Page 2 of 19 



NAbs [7,8]. A functional Env consists of heterotrimers 
of three gpl20 subunits, non-covalently linked to three 
gp41 molecules that anchor the Env spike in the viral 
membrane. Gpl20 has five conserved (C1-C5) and five 
variable regions (VI -V5) [9,10]. The conserved regions 
form the core of the protein and are crucial for binding 
to the CD4-receptor on target cells [11-13]. The variable 
regions are highly diverse in sequence as a consequence 
of high replication rates, recombination, deletions, inser- 
tions and mutations [14]. Gpl20 and gp41 contain 20-35 
and 3-5 potential AMinked glycosylation sites (PNGS), 
respectively. AZ-linked glycans compose approximately half 
of the molecular mass of gpl20 [15] and are required for 
correct protein folding, binding to lectin receptors on 
immune cells, as well as for immune evasion [16-20]. 

Within weeks to months after primary infection, HIV-1 
Env specific antibodies appear [21] that are generally 
limited in their neutralizing activity and restricted to 
early autologous viruses [22-25]. The first antibodies are 
directed against gp41, predominantly to the immunodo- 
minant epitope, followed by non-NAbs against gpl20, 
which in clade B-infected individuals often target the 
V3 region [26]. Subsequently, weakly neutralizing V3 
antibodies capable of neutralizing heterologous tier 1 
HIV-1 isolates appear [27] followed by NAbs with other 
epitope specificities from which the virus rapidly escapes 
through sequence changes in the variable loops and an 
increasing number of PNGS. Specifically, an increase in 
length and number of PNGS of the VI V2 region plays a 
role in HIV-1 resistance to NAbs [22,28-32]. 

Within 2-4 years post-seroconversion (post-SC), 10-30% 
of the HIV-1 infected individuals develop CrNA [33-40], 
which is directed against conserved epitopes such as the 
CD4-binding site (CD4BS), the membrane proximal region 
(MPER) in gp41 and glycan dependent epitopes that often 
include the glycans at positions 160 or 332 [41,42]. Over 
the years, a number of broadly reactive neutralizing 
monoclonal antibodies (bNAbs) have been isolated from 
HIV-infected individuals with CrNA. These antibodies 
show broad and potent activity against different HIV-1 
subtypes. As HIV-1 can rapidly escape from autologous 
humoral immunity with high levels of CrNA [25,34], this 
immune response is not associated with prolonged 
asymptomatic survival of the individuals that make 
them [8,21,25,36,43,44]. Nonetheless, bNAbs can pro- 
tect non-human primates against viral challenge in 
passive immunization studies [45-49], which supports 
the idea that an HIV-1 vaccine that elicits CrNA could 
be effective against HIV-1 acquisition in humans. 

BNAbs delineate sites of vulnerability on the Env 
spike. These sites can be divided into four bNAb epitope 
clusters located at different positions on the Env trimer 
[50-53]. Three epitope clusters are located on gpl20, 
and one on gp41. The first epitope cluster is the CD4BS, 



which is the epitope for multiple bNAbs such as bl2, 
VRC01, VRC-PG04, NIH45-46 and 3BNC117 [54-58]. 
The more recently discovered PGT121-130 and PGT135 
bNAbs, as well as 2G12, target the outer domain on 
gpl20 and are N332 glycan dependent [52,59-63]. 
BNAbs PG9, PG16, PGT145 and CH01-CH04 target 
conserved epitopes within VI V2 that are expressed on 
trimeric Env [52,53,64-66]. Preferential binding of PG9 
to the quaternary structure of trimeric Env could be 
explained by its binding to an epitope at the apex of the 
trimer that constitutes elements from two protomers [67] . 
PG9, PG16 and PGT145 neutralization is dependent on 
the presence of glycans, especially at position N160 
[64,68]. One epitope cluster is located in the MPER of 
gp41, which contains the binding sites for 2F5, 4E10 and 
10E8 [69-72]. 

The extensive glycosylation of gpl20, amounting to 
-50% of its molecular weight, was long thought to con- 
tribute to an immunologically "silent face" and serve as a 
"glycan shield" [8,73]. The recent identification of many 
glycan-dependent bNAbs shows that the silent face may 
not be so immunologically silent after all, and that this 
shield can be penetrated and/or used by bNAbs. The 
epitope cluster on gpl20 which is targeted by the bNAbs 
PGT121-130, PGT135 and 2G12 [60-62,67,74], requires 
a specific glycan at position 332, although there is a 
difference in how the different bNAbs approach this 
glycan. In addition, neutralization activity in serum of 
infected humans and macaques, which developed CrNA, 
is dependent on recognition of the epitope involving this 
glycan at position 332 [36,75,76]. Interestingly, Moore et al 
recently showed that the 332 glycan was absent on the Env 
of early viruses from two clade C infected individuals but 
emerged as a means of escape from autologous neutralizing 
responses, thereby creating 332 glycan-dependent bNAb 
epitopes [77]. For these reasons it seems that the glycosyla- 
tion at position 332 plays a substantial role in the develop- 
ment of CrNA. 

Although the existence of bNAbs in natural infection 
is testimony that the native Env spike can elicit bNAbs, 
it remains unknown why this occurs in only 10-30% of 
HIV-1 infected individuals. Here, we hypothesized that 
specific properties of early Envs could contribute to the 
development of CrNA. Information on such properties 
would obviously be valuable for vaccine design aimed at 
generating similar CrNA [78]. To test our hypothesis, 
we retrospectively examined env sequences from early 
HIV-1 clonal variants in clade B infected individuals 
that developed diverse levels of CrNA later on during 
infection. We found that CrNA development correlated 
with early HIV-1 variants with shorter VI regions, lower 
probability of glycosylation, and specific amino acid 
usage. These properties might open up avenues for 
vaccine design. 
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Results 

Short VI sequences correlate with the development of 
cross-reactive neutralizing activity 

To study Env determinants that may influence the in- 
duction of CrNA, we retrospectively generated env 
sequences from early HIV-1 variants in 31 individuals 
who at 2-4 years post-SC had diverse levels of CrNA in 
their serum (Figure 1) [35,79]. We chose this experi- 
mental setup because contemporaneous viruses usually 
already have escaped from CrNA [25,34] and early 
viruses are proposed to be a major determinant for the 
induction of CrNA [80]. The individuals were matched 
for time between SC and CrNA measurement, time 
between SC and clonal virus isolation, CD4 + T cells/ ul 
blood at set-point and viral load at set-point (Additional 
file 1: Figure SI). Data on HIV-1 neutralizing activity in 
serum were available from previous studies (n = 292) 
[35,40,79]. In short; sera were tested by Monogram 
Biosciences [81] for cross-reactive neutralizing activity 
in a pseudovirus assay involving six tier two viruses with 
Envs from primary isolates of HIV-1 subtypes A 
(94UG103), B (92BR020 and JRCSF), C (93IN905 and 
MGRM-C-026) and CRF01_AE (92TH021). This six viral 
panel covered 93% of the variation in neutralization of a 
larger pseudovirus panel (n = 15) [39]. It has been shown 
that classification of CrNA in sera, as determined on this 
six virus panel, was highly correlated with CrNA deter- 
mination on a larger 23 virus panel [35]. The geometric 
mean IC 50 titers in the sera of the selected individuals 
against a panel of 6 HIV-1 variants varied from 20 to 297; 
with an average of 98 (see Table 1 for details). Phylogenetic 
analysis of all sequences, using either neighbour-joining 
or maximum-likelihood methods, revealed clustering of 



sequences per individual, excluding contamination, but, 
clustering of HIV-1 Env sequences from individuals 
with similar geometric mean IC 50 titer was not observed 
(Additional file 2: Figure S2). We observed that the geo- 
metric mean IC 50 titer in serum correlated weakly with 
the mean length of VI ((Figure 2B, r = -0.36; p = 0.049) 
but not with overall mean gpl60 length (Figure 2A), nor 
with the total length of either the variable or the con- 
served regions (data not shown, r = -0.21; p = 0.25 and 
r = -0.052; p = 0.80, respectively). 

Lower probability of overall glycosylation correlates with 
the induction of cross-reactive neutralizing activity 

We did not observe a correlation between the geometric 
mean IC 50 titer in serum, and the mean total number of 
PNGS in gpl60 (Figure 2C) or the mean number of 
PNGS in the individuals variable or constant regions 
(data not shown). It has been shown that NXT motifs 
have a two to three times higher probability of becoming 
glycosylated than NXS motifs [82,83]. The total number 
of PNGS may therefore not entirely reflect the actual ex- 
tent of glycosylation of a given Env molecule. Interest- 
ingly, the mean number of NXS acceptor motifs relative 
to the mean total number of PNGS correlated positively 
with the geometric mean IC 50 titer in serum (Figure 2D, 
r = 0.41; p = 0.037), while the opposite was true for the 
mean number of NXT acceptor motifs relative to the 
mean total number of PNGS. Thus, a higher relative 
number of NXS over NXT motifs, i.e. less probability of 
glycosylation was associated with the development of 
CrNA. However, we note that we cannot make state- 
ments on the actual glycosylation of individual NXS and 
NXT motifs. 
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Figure 1 Sampling of viruses and sera. Time bars showing the 
period in which viruses were isolated and the period in which sera 
were obtained for assessment of neutralizing activity. The white, grey 
and dark grey bars represent the individuals with non-QNA (n=9), 
intermediate CrNA (n=10) and CrNA (n=12), respectively. The x-axis 
represents the months post-SC. The box-plots represent the sampling 
periods with minimum and maximum time points indicated by 
whiskers. The median time points of virus and sera sampling for the 31 
individuals are indicated by solid vertical lines in the boxes. 



Lower probability of glycosylation at position 332 
correlates with the induction of cross-reactive 
neutralizing activity 

We noted that the glycosylation motif at position 332, 
which is important for a number of bNAb epitopes 
[52,59-62,74], is always of the NXS type, and may thus 
not always be occupied by a glycan. In particular when 
the acceptor motif is NXS, the amino acid at position X 
is also important for determining the probability of gly- 
cosylation [84]. In all our sequences, the second position 
was occupied by either a leucine or an isoleucine. We 
found that individuals who developed CrNA had signifi- 
cantly more often an NLS motif at position 332, while 
individuals who did not develop CrNA had more fre- 
quently an NIS motif (Figure 3 A, p = 0.032). Studies 
with rabies virus glycoprotein showed that an NLS motif 
has a two times lower probability of becoming 
glycosylated compared to an NIS motif [84], which 
might imply that a lower probability of Af-glycan attach- 
ment to N332 is associated with the development of 
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Table 1 Individual's data summary 



Individual 


Geometric 
mean IC 50 
titer 3 


Number of 

viruses 
neutralized 13 


Date of 
SC 


Months from 
SC to CVI d 


CD4 count 
at SP e 


Viral load (log) 
at SP f 


Env 
sequences 
obtained 9 


Neutralization 
assay 
performed 11 


SH 
analysis 
performed 1 


ACH 19829 


297 


6 


26-jan-86 


10.5 


440 


3.9 


gp160 


+ 


+ 


ACH 18969 


275 


5 


29-Oct-87 


1.8 


520 


4.3 


gp160 


+ 


+ 


ACH 19308 


231 


5 


14-jul-91 


3.3 


380 


4.4 


gp160 


+ 


+ 


ACH 18814 


230 


6 


27-jul-87 


3.2 


500 


5.1 


gp160 


+ 


+ 


ACH 1881 8 


205 


4 


26-jan-86 


8.2 


510 


5.4 


gp160 


+ 


+ 


ACH 19885 


180 


5 


28-dec-94 


2.9 


580 


3.3 


gp160 


+ 


+ 


ACH 11 668 


169 


5 


28-jun-86 


7.9 


380 


4.5 


gp160 


+ 


+ 


ACH 19463 


164 


5 


14-jul-86 


2.1 


350 


4.8 


gp160 


+ 


+ 


ACH 19507 


140 


6 


23-apr-88 


7.9 


400 


5.0 


gp160 


+ 


+ 


ACH 19474 


126 


5 


27-jul-87 


12.7 


410 


3.0 


gp160 


+ 


+ 


ACH 19999 


111 


4 


26-aug-85 


4.3 


1060 


4.6 


gp160 


+ 


+ 


ACH 19793 


93 


4 


28-jun-86 


12.3 


1000 


4.5 


gp160 


+ 


+ 


ACH 19588 


91 


4 


20-nov-85 


13.8 


600 


5.3 


gp160 


- 


- 


ACH 19566 


78 


5 


11-Mar-85 


13.2 


610 


3.5 


C1-V5 


- 


- 


ACH 19768 


64 


5 


6-apr-86 


2.2 


370 


3.7 


gp160 






ACH 18860 


64 


2 


25-aug-86 


2.6 


550 


4.9 


C1-V5 


- 


- 


ACH 18839 


64 


3 


1 7-Oct-86 


3.3 


330 


3.8 


C1-V5 


- 


- 


ACH 19542 


45 


3 


23-jun-85 


4.5 


310 


3.9 


gp160 


- 


- 


ACH 18766 


43 


1 


15-jul-88 


5.1 


760 


4.3 


gp160 


- 


- 


ACH 19861 


43 


3 


24-feb-89 


3.0 


470 


4.3 


gp160 


+ 


+ 


ACH 19453 


39 


3 


23-dec-85 


7.0 


810 


3.0 


gp160 


- 


- 


ACH 18829 


38 


3 


2-feb-87 


4.0 


770 


2.6 


C1-V5 


- 


- 


ACH 19792 


34 


2 


8-nov-85 


6.9 


450 


4.9 


gp160 


+ 


+ 


ACH 19961 


31 


0 


14-sep-87 


2.4 


530 


4.5 


gp160 


+ 


+ 


ACH 19329 


31 


1 


28-dec-94 


4.3 


780 


3.9 


gp160 


+ 


+ 


ACH 19974 


29 


2 


26-jan-86 


8.8 


550 


3.5 


gp160 


+ 


+ 


ACH 19489 


29 


0 


30-sep-85 


7.4 


550 


3.1 


gp160 


+ 


+ 


ACH 18887 


29 


1 


7-dec-89 


8.8 


670 


4.6 


gp160 


+ 


+ 


ACH 19342 


28 


2 


14-jul-86 


3.0 


700 


3.9 


C1-V5 






ACH 19576 


23 


1 


2-feb-88 


7.0 


500 


4.5 


gp160 


+ 


+ 


ACH 18880 


20 


0 


28-jun-86 


10.4 


860 


4.8 


gp160 


+ 


+ 



Neutralization titers are expressed as the geometric mean of the reciprocal plasma dilution that inhibited virus infection by 50% for the panel of 6 
heterologous viruses. 

b The number of viruses neutralized, at least 3-fold higher than the IC 50 of the same serum with MLV, from the panel of 6 heterologous viruses from 

different subtypes. 

c Date of seroconversion (SC). 

d Months post SC to time of clonal virus isolation (CVI). 
e CD4 + T-cell (CD4 + ) count at setpoint (SP). 
f Log viral load at setpoint (SP). 
9 Env fragment sequenced. 

individuals from which viruses were used in the neutralization sensitivity assay with MAbs and polyclonal HIVIg pools are indicated with +. 
'individuals from which viral sequences were used in the Sequence Harmony model are indicated with +. 



CrNA. There is no straightforward way to directly assess 
glycan occupancy of position 332 on virus isolates, but we 
can study this indirectly by investigating the neutralization 
sensitivity to bNAbs that are dependent on the presence 
of a glycan at position 332, such as 2G12 and PGT126. 



Thus, we tested the sensitivity of the viruses from our 
infected individuals against 2G12 and PGT126, and di- 
vided them into two groups, for the presence of NLS or 
NIS at position 332. For 2G12 we excluded viruses that 
lacked one or more of the essential 2G12 glycans 295, 332, 
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Figure 2 Short VI sequences and lower probability of overall glycosylation correlate with the development of cross-reactive 
neutralizing activity. Scatter plots of individual's geometric mean IC 50 titer across the 6 virus panel (x-axis) versus sequence characteristics of the 
gp160s from clonal virus isolates on the y-axis. (A) length of gp160 in amino acids (AA); (B) length of variable region 1 (VI) in amino acids (AA); 
(C) total number of PNGS in gp160; (D) number of NXS motifs relative to the total number of PNGS. 



386 or 392 [61,62,85] (i.e. the selected viruses should in 
theory all be sensitive to 2G12). We defined resistance as 
>50% infectivity at the highest concentration of bNAb 
(25 ug/ml and 5 ug/ml for 2G12 and PGT126, respect- 
ively; Figure 3B&C). We found that 7 out of 21 viruses 
(33%) with an NLS motif were resistant to 2G12, while 
only 2 of 14 viruses (14%) with an NIS motif were resist- 
ant (Figure 3B). Furthermore, we found that 4 out of 30 
viruses (13%) with an NLS motif were resistant to 
PGT126, while none of the viruses with an NIS motif 
were resistant (Figure 3C). All NLS containing viruses 
that were resistant to PGT126 were also resistant to 
2G12. Although these differences did not reach statistical 
significance they are consistent with a lower probability 
of glycosylation of NLS motifs, of which the presence is 
associated with the development with CrNA. 

Sequence Harmony identifies specific amino acids that 
correlate with the induction of cross-reactive neutralizing 
activity 

The Sequence Harmony (SH) method ([86] and www.ibi. 
vu.nl/programs/seqharmwww) was used to identify amino 
acid differences in Env between individuals who developed 
CrNA and who did not (Table 2). Low SH-scores indicate 
positions where the amino acid composition is different 
between the two groups; a score of 0 indicates that the 



amino acids at a given position are completely different 
between both groups, while the maximum score of 1 in- 
dicates that the amino acid compositions are indistin- 
guishable. In addition, an empirical Z-score is calculated, 
reflecting the significance of the SH score obtained based 
on 100 random shuffling events of the sequences between 
the two groups. The SH analysis yielded 39 sites with a SH 
score below the defined cut-off values, indicating differ- 
ences in amino-acids present at these sites between the 
CrNA and non-CrNA groups, all with high (negative) 
Z-scores indicating high significance (Table 2). A phylo- 
genetic analysis showed the absence of sequence group 
clustering (Additional file 2: Figure S2), excluding spuri- 
ous findings based on phylogeny. Individual sequences 
were weighted in the calculation of the SH-scores, such 
that each individual had equal weight (see Methods for 
details). For comparison, we repeated the analysis with 
a single consensus sequence per individual. This yielded 
essentially identical scores (the correlation between SH- 
scores and Z-scores is both r = 0.98, and p < 0.0001), 
but less detailed information on the specific amino acid 
changes (Additional file 3: Table SI and Additional file 4: 
Figure S3). For the variable domains we used a cutoff of 
SH <0.7 and for the conserved domains a more relaxed 
cutoff of SH <0.85. This extended (higher) cut-off is neces- 
sary to find small differences within the conserved regions 
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PNGS motif at position 332 PNGS motif at position 332 

Figure 3 Lower probability of glycosylation at position 332 correlates with the induction of cross-reactive neutralizing activity. (A) Pie 

charts representing the distribution of NLS and NIS PNGS motifs at position 332 between 12 individuals who developed and 9 individuals who 
did not develop CrNA. Red and green represent NLS and NIS motifs, respectively. (B) Residual virus infectivity in the presence of high 
concentrations of 2G12 (25 ug/ml), excluding viruses lacking one or more of the essential 2G12 glycans 295, 332, 386 or 392 [61,62,85], for virus 
clones that have NLS or NIS motif. (C) Residual virus infectivity in the presence of high concentrations of PGT126 (5 ug/ml), for virus clones that 
have NLS or NIS motifs. Values in the gray area are considered sensitive, while values in the white area are considered resistant. 



that have an overall higher sequence similarity. 22 sites 
were located in the variable regions, including 1 distinct 
site with the maximal score of 0.0 located in the VI region 
which also contains 10 other sites scoring below 0.7 (z- 
scores ranging from -8 to -69). In the multiple sequence 
alignment, 6 insertions occur within the VI region be- 
tween positions 140 and 141 and are in agreement with 
the observed longer VI region in individuals who did not 
develop CrNA (Figure 2B). One site is located in the V3 
region (position 322), 8 sites in V4 (396, 399, 403-407 and 
412) and 2 sites in V5 (461 and 464) all with z-scores ran- 
ging from -9 to -35. As conserved regions show inher- 
ently higher sequence conservation than the variable 
regions, a less strict cut-off of SH <0.85 was used. This 
resulted in the selection of 17 sites: 6 in the CI region (32, 
33, 85, 87, 97 and 130), 3 in C2 (268, 271 and 275), 6 in C3 
(333, 336, 337, 343, 346 and 347), and 1 site in C4 (S440) 
(z-scores ranging from -3 to -16). The change at position 
333 in C3 corresponds to the second position of the N332 
glycosylation motif (see above). One additional relevant site 
was identified in the ectodomain of gp41 (621). 

Structural mapping of the Sequence Harmony results 
reveals clustering of amino acids that are associated with 
the induction of CrNA 

To better understand the impact of the results gained with 
SH, we mapped the positions that differed significantly on 



the three-dimensional structure of Env (Figure 4). A num- 
ber of gpl20 crystal structures were combined, as de- 
scribed in detail in the Methods section, to generate a 
structure of the Env spike that contains all gpl20 residues 
(Figure 4A and B). Gp41 was not included as it contains 
only one position that was identified by SH (at position 
621). Although this model may not be an accurate repre- 
sentation of the Env spike with correct positions and con- 
formations of the variable domains, it is useful to visualize 
the residues identified by the SH analysis. The carbon 
trace is indicated in red, while the residues selected by the 
SH analysis are shown in yellow space filling. Mapping of 
the residues identified by SH on the 3D structure of Env 
reveals that the residues that differ between the Envs, 
which are derived from individuals that did develop and 
those that did not develop CrNA, cluster together in 
specific domains. First, a number of residues cluster in the 
VI (Figure 4C and D). Second, a large cluster includes a 
major portion of the V4 as well as part of the underlying 
C3. Looking into these positions in more detail it is evi- 
dent that a number of C3-residues located on the side of 
the a2 helix that faces the V4 are different between the 
two sequence groups and those residues that differ in the 
V4 are all close to the a2 helix (Figure 4E and F). A third 
cluster of residues can be observed in the gp41 -interactive 
domain, including residues in CI and C2, although these 
residues are more scattered throughout the domain, 
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Table 2 Sequence Harmony results 
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Region 


Amino acid position 3 


SH-scores b 


Z-scores c 
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-5.5 
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1 
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1 


C1 
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-4.8 


Vkaei 


Ver 


2 


C1 


V87 
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-4.9 


Ekg 


Eadg 


2 


C1 


K97 


0.840 


-5.1 


Kint 


Knr 


3 


C1 


K130 


0.843 


-8.5 


Ne 


Nd 


4 


V1 


L134 


0.620 


-11.4 


Lf 


VL_aim 


5 


V1 


K135 


0.346 


-17.0 


RKeqg 


-hknwg 


5 


V1 


N136 


0.698 


-12.9 


Nt 


N-t 


5 


V1 


T138 
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Tn_ 
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6 


V1 


N139 
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-12.5 


N_t 


-nist 


6 


V1 
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Tiks_n 
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6 


V1 


.141 
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SR-egk 


6 


V1 
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-tn 


N 


6 


V1 


.141 
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-sktni 
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6 


V1 


.141 


0.000 
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-h 


T 


6 


V1 


.141 


0.000 
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- 


Nads 


6 


V1 


.141 
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-16.1 


-ts 
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6 


V1 


N141 
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NTs 
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6 


V1 


S142 
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Ns 
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6 


V1 


.145 


0.518 
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Snw_ 




V1 


.146 
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-15.8 


-m 


Ll-vg 




V1 
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-8.3 


E-GRtak 


GK-qte 


6 


C2 


E268 


0.814 


-4.6 


Ekqg 


Edgr 


7 


C2 


V271 
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-15.3 


Vi 


Va 


8 


C2 


V275 
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-9.0 
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3 


V3 


K322 
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-17.4 


DE 


Eaq 


9 


C3 


I333 


0.804 


-15.5 


Li 


II 


10 


C3 


A336 


0.684 


-9.1 


TAgvel 


Aekv 


10 


C3 


K337 


0.686 
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10 


C3 
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AV 
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F396 


0.578 


-16.4 


NT-wf 
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-10.9 
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-t 
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.404 
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V4 
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V4 
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.412 


0.689 


-11.6 




-snt 





van den Kerkhof et al. Retrovirology 2013, 10:102 
http://www.retrovirology.eom/content/1 0/1 /1 02 



Page 8 of 1 9 



Table 2 Sequence Harmony results (Continued) 



C4 


S440 


0.729 


-6.8 


Raks 


KReg 


12 


V5 


S461 


0.650 


-14.0 


DE-sn 


-n 




V5 


.464 


0.692 


-9.4 


NEK- 


-rk 


13 


V5 


.464 


0.557 


-15.2 


-nda 


-Nti 


13 


gp41 


Q621 


0.540 


-9.4 


Kestardq 


EMYQNK 





a Amino acid numbering based on the HXB2 reference sequence. 

b Sequence Harmony scores (SH-scores): SH-scores below a cut-off of <0.7 for variable regions and a cut-off of <0.85 for conserved regions are shown. 

c The z-scores display the accuracy of a given result; the more negative the z-score the less likely it is that the results was found by chance. 

d The 'consensus string' shows the residues found in each group on that specific site ordered by frequency were a lower case letter is present less than half the 

amount of the most abundant residue (upper case). 

individuals who developed CrNA. 

individuals who did not develop CrNA. 

9 Numbers represent the subclusters as defined in Additional file 5: Table S2. 



compared to those in the first two clusters. It is also obvi- 
ous that some large regions in the protein do not contain 
any changes, most notably the CD4BS and the V2. Only 
one change is present in the bridging sheet, and also only 
one in the V3. The residues identified by SH can be fur- 
ther subdivided into thirteen subclusters based on their 
proximity to each other (<9 A; Additional file 5: Table S2). 

The induction of cross-reactive neutralizing activity does 
not correlate with sensitivity to bNAbs 

We tested whether the early HIV-1 variants from indi- 
viduals who developed CrNA were more sensitive to 
neutralization by bNAbs covering all known epitope 
clusters (gpl20 outer domain: 447-52D, 2G12, PGT121 
and PGT126; CD4BS: bl2 and VRC01; quaternary 
epitopes: PG9, PG16 and PGT145; MPER: 4E10 and 
2F5). We did not observe that the viral variants of indi- 
viduals who developed CrNA were more sensitive to 
neutralization by different bNAbs compared to viruses 
from individuals who did not develop CrNA (Figure 5). 
We also analyzed the neutralization data for each bNAb 
per individual and we observed that the neutralization pat- 
tern was not dependent on the level of CrNA (Additional 
file 6: Table S3). We next tested the sensitivity of these 
viruses to three different polyclonal HIVIg pools in which 
multiple epitope specificities should be present. Interest- 
ingly, early HIV-1 variants of individuals who developed 
CrNA showed a trend towards being more sensitive to 
neutralization by polyclonal HIVIg compared to early 
viruses from individuals who did not develop CrNA 
(Figure 5), which could be consistent with them displaying 
a more open Env structure. However, this was only statis- 
tically significant for one of the three HIVIg pools 
(p=0.037). 

Discussion 

In the Amsterdam Cohort Studies on HIV infection and 
AIDS (ACS) of men who have sex with men (MSM) 
infected with HIV-1 subtype B, the prevalence of CrNA 
at 2-4 years post-SC is around 33% [40,79] . This roughly 



corresponds to what has been found in other cohorts 
[33,37-39]. In this study we aimed to identify differences 
in the properties of early Env proteins in 31 HIV-1 
infected individuals in regard to the development of 
CrNA. We observed that a short VI region and a lower 
probability of AMinked glycosylation at PNGS correlated 
with the development of CrNA later in infection. More 
specifically, a lower probability of glycosylation at pos- 
ition 332 was associated with the development of CrNA. 
However, we note that the studies on the glycosylation 
efficiency of NXS/T motifs have been performed with a 
non-HIV protein [84] and the probability of glycosylation 
is likely to be context dependent. We can, therefore, not 
make definitive statements on the actual glycosylation of 
individual viruses or groups. Although some of the ob- 
served associations were marginally significant, and re- 
quire confirmation in independent studies, they are in line 
with the hypothesis that HIV-1 Envs from individuals who 
develop CrNA have a more open structure, which allows 
for increased accessibility of bNAb epitopes, and possibly 
more efficient induction of bNAbs. To confirm this 
hypothesis we conducted a neutralization assay with early 
HIV-1 clonal variants from only the individuals who did 
or did not develop CrNA and found that early viruses of 
individuals who developed CrNA were indeed more sensi- 
tive to neutralization by one polyclonal HIVIg sample (but 
not by two other HIVIg samples), although we were not 
able to map this to specific bNAb epitopes. We next used 
SH for a more detailed analysis of amino acid differences 
at specific positions in Env and their role in the induction 
of CrNA. This resulted in the identification of 39 Env resi- 
dues which differed significantly between the two groups, 
based on defined SH-scores and z-scores. These residues 
showed a remarkable clustering in VI, C3-V4, and some- 
what more diffusely in CI -C2. In addition, a few individual 
changes were observed: one in V3, one in C4, three in the 
V5, and one in the gp41 ectodomain. Strikingly, we did 
not find any residues in the V2 domain, in the CD4BS, or 
in the triple layered structure that translates CD4 induced 
conformational changes to gp41 [88]. We note that the 
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Figure 4 Three dimensional clustering of amino acid positions that differ between individuals that induce CrNA and those that do not. 

Side (A) and top (B) views of the Env spike. The structures of three gp1 20 protomers, including the entire gpl 20 sequence and modeled as 
described in the Methods section, were fitted into the cry-EM density of the virus-associated Env spike [87]. A backbone trace of one protomer is 
colored in red and the positions revealed by SH are indicated in yellow space filling in the same protomer. (C, D). Model of gpl 20 with the same 
color-code as in A and B. Various Env subdomains are indicated in black font. Residues identified by SH are labeled in blue. The views are from 
the approach of CD4 (C) or rotated by 90° over the y-axis (D). The viral membrane would be at the top and the target membrane at the bottom. 
(E, F). Details of the V4 domain and its association with the a2 helix of the C3 domain. The residues identified by SH are indicated in sticks. 

V J 



differences we found might be specific for individuals 
who are infected with HIV-1 subtype B, and should be 
reproduced in other non-subtype B cohorts. 

Our data revealed an inverse correlation between VI- 
length and the development of CrNA. Two previous 
studies have looked for Env signatures associated with 
the presence of CrNA, but these studies used contem- 
poraneous samples for Env analysis and assessment of 
neutralization breadth and potency, and it is not likely 



that the contemporaneous Envs represent the Envs that 
induced the CrNA [89,90]. Nevertheless, in the first 
study, involving clade C infected individuals, short VI V2 
domains correlated with the presence of CrNA, but the 
VI and V2 were not studied separately [90]. In the sec- 
ond study, involving individuals infected with different 
clades, short V2 and short V5 regions correlated with 
the presence of CrNA in the sera [89]. Again, the caveat 
in these studies is that the Env sequences and sera were 



van den Kerkhof et al. Retrovirology 2013, 10:102 
http://www.retrovirology.eom/content/1 0/1 /1 02 



Page 10 of 19 



10,000- 
1,000- 
100H 



i 1 °- 



m 1 
O 



0.1 
0.01 



e 



0 




0.001 - — i 1 1 1 1 1 1 1 1 1 1 r 

b12 VRC 447 2G12 PGT PGT PG9 PG16 PGT 2F5 4E10 1 



01 -52D 



121 126 



145 



CD4BS gp120 outer domain/ 
V3 



Quaternary MPER 



HIVIg 



Figure 5 Sensitivity to bNAbs does not correlate with the induction of cross-reactive neutralizing activity. Sensitivity to neutralization by 
b12, VRC01, 447-52D, 2G12, PGT121, PGT126, PG9, PG16, PGT145, 2F5 and 4E10 and polyclonal HIVIg (three different sera pools) of viral variants 
obtained from 12 individuals who developed CrNA (grey bars) and 9 individuals who did not develop CrNA (white bars). Median IC 50 values per 
individual per bNAb or HIVIg as determined by linear regression are used and differences were considered statistically significant when P values 
were < 0.05, represented by an asterix. The bNAbs are ordered from left to right according to their epitope cluster. 



derived from the same time point during chronic infection; 
therefore it cannot be excluded that the Env characteristics 
observed were a consequence rather than a cause of the 
CrNA in serum, due to viral escape from the CrNA. In 
contrast, we studied the Envs of early viruses (median 5 
months post-SC), before the presence of CrNA, and re- 
lated the characteristics to the development of CrNA later 
in infection. It is arguably more likely that the observed 
Env signatures we identified indeed contributed to the 
shaping of the CrNA. Although other choices could have 
been made in the timing of our sampling, Moore et al. 
showed that transmitted/founder viruses raise autologous 
neutralization, which triggers the evolution of escape vari- 
ants around month 6 (roughly around the same time as 
our sampling of env sequences), which in turn induce 
CrNA [77]. 

The SH-analysis confirmed a role for VI, but in addition 
to the deletions/insertions (at positions 141, 145 and 146) 
identified a number of other amino acid changes in VI 
that were associated with the development of CrNA (at 
positions 134-140, 142 and 151). It has long been known 
that VI and V2 are not absolutely required for function 
[10,91], but that they play an important role in resistance 
to antibody neutralization [28,31,92-96]. On a population 
level the continuous neutralizing antibody-driven evolu- 
tion of V1V2 during the pandemic appears to have 
resulted in the elongation and increased glycosylation of 
the VI V2 domains [97]. It may therefore not come as a 
surprise that short VI domains, with a specific amino 
acid composition, might affect the induction of CrNA. 



We note, however, that we found no residues in V2 that 
differed between the two groups. This could suggest 
that VI length and composition are more important deter- 
mining factors in the induction of CrNA than V2 length 
and composition, at least in our study population. 

Previous studies indicated that the Envs from transmitted/ 
founder HIV-1 variants contain fewer PNGS compared to 
Envs from viruses in chronic infection [98]. The increase in 
number of PNGS coincides with a decreasing sensitivity to 
autologous neutralization and is proposed to be driven by 
the pressure of neutralizing antibodies [8,22,29,30,32,99]. 
Furthermore, a reduced prevalence of CrNA in recently 
infected individuals compared to historic serum samples 
was associated with an increased number of PNGS in early 
Env sequences [97], suggesting that an increase in Env 
glycosylation over the course of the epidemic results in de- 
creased induction of CrNA. Thus, an increased number of 
PNGS and a higher level of A/-linked glycosylation could 
interfere with the induction of bNAbs, as these glycans 
may interfere with the binding of the B cell receptor to 
protein epitopes on Env. We did not observe a correlation 
with the absolute number of PNGS and the later presence 
of CrNA. However, we did observe that a lower probability 
of glycosylation at the PNGS in early HIV-1 variants was 
associated with CrNA presence later in infection. In previ- 
ous Env sequence analyses, NXS and NXT PNGS motifs 
have always been treated equally, but NXS motifs are two 
to three times less likely to be glycosylated than NXT 
sequons [82,83,100]. Furthermore, when an NXS motif is 
present, the amino acid at the second position (the X) 
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becomes highly relevant to determining the probability of 
glycosylation [84]. 

Zooming in on the NXS motif at position 332, the 
glycan which is critical for the binding of many known 
bNAbs [52,59-62,74] and which was associated with 
neutralization escape from early strain-specific anti- 
bodies and induction of CrNA [77], we observed a cor- 
relation between the amino acid at the second position 
and the presence of CrNA in serum later in infection. 
Thus, individuals who developed CrNA had significantly 
more often an NLS motif at position 332 compared to the 
individuals who did not develop CrNA who usually had 
an NIS motif. An NLS motif is two to three times less 
likely to be glycosylated compared to an NIS motif [84], 
implying that a lower chance of glycosylation at position 
332 is associated with the development of CrNA. The 
probability of glycosylation and how HIV-1 might use this 
is unchartered territory and should be considered in 
other studies on the role of Env glycosylation in the co- 
evolution of the virus and the human immune system. 

The association of a lower probability of glycosylation 
at position 332 with the induction of CrNA is counterintu- 
itive when taking into account that many known bNAbs 
target this glycan [52,59-62,74]. There are two possible ex- 
planations for this paradox. First, this glycan is not often 
targeted in our study population and its absence increases 
the accessibility of other bNAb epitopes. Second, it is not 
the presence of this glycan per se that facilitates the induc- 
tion of bNAbs to this site, but the emergence of this site 
during infection, possibly by means of escape from Abs 
that targeted the surrounding region in the absence of the 
glycan. This later scenario was indeed observed in individ- 
uals infected with clade C Env [77]. The situation in our 
cohort was slightly different because the NXS motif at 
position 332 was present in all sequences, only the 2 nd 
position changed over time. Furthermore the absence of 
the PNGS at position 332 in transmitted subtype C was 
accompanied by the presence of a PNGS at position 334 
[77], while we never observed a PNGS motif at position 
334 in our sequences. The impact of the PNGS at one of 
these two positions, and its relation with subtype specific 
transmission and antibody escape and induction, requires 
further study in different cohorts. 

The SH analysis identified multiple amino acid posi- 
tions in the C3 and V4 region that differed significantly 
between individuals that did or did not develop CrNA. 
Inspection of a quaternary model of the Env spike showed 
that many of these positions form two subclusters on the 
a2-helix in the C3 region (HXB2 numbering 335-352) 
and the V4 loop (Figure 4). One cluster contains 6 resi- 
dues: 333 (which may modulate glycosylation at position 
332; see above), 336 and 337 from C3, and 405-407 from 
V4. The second subcluster involves 5 residues: 343, 346 
and 347 from C3, and 396 and 399 from V4. The C3 a2- 



helix interacts intimately with V4 and several of the differ- 
ences found between individuals that did or did not 
develop CrNA might impact the interaction between the 
two subdomains. For example, it has been reported that 
interactions between residues 335 and 412, and 337 and 
412 play an important role in the interaction of these two 
regions through electrostatic interactions [101]. In our SH 
analyses, the Env sequences of individuals who developed 
CrNA, position 337 is often occupied by a negatively 
charged amino acid (E), whereas this position is mostly 
positively charged (K) in individuals that did not develop 
CrNA. A similar charge reversal is observed for position 
343 which interacts with a number of V4 residues. Thus, 
changes in electrostatic interactions may influence the 
interaction between C3 and V4. Moreover, Kirchherr et al 
identified three amino acid substitutions in the V4 region 
(393G, 397G, and 413N) that were associated with greater 
neutralization potency and breadth [102]. Positions 393 
and 397 are not observed in our SH-analysis but the 
neighboring residues, 396, 399 and 400, are. Position 413 
is observed in our analysis and is part of a PNGS. Envs as- 
sociated with CrNA more frequently had an Asparagine 
(N) at position 413. Gnanakaran et al noted that the C3 
a2-helix in clade B viruses is more hydrophobic and 
shielded from solvent by the V4 loop and glycosylation, 
while this domain is less hydrophobic, more exposed and 
more immunogenic in clade C viruses [101,103]. Collect- 
ively these findings imply that the way in which the V4 
folds over and covers the a2-helix of C3 might affect the 
induction of bNAbs [77]. 

The impact of other residues highlighted by the SH- 
analysis is somewhat unclear. Several hits were found in 
CI and C2, including the (3-sandwich domain which is 
known to interact with gp41 [104]. The two sites identified 
by the SH-analysis at positions 461 and 463/464 in V5 are 
in close proximity to the CD4BS and might impact its ac- 
cessibility. Two other sites which were significantly differ- 
ent between the two groups were at position 322 (V3) and 
440 (C4). Position 322 is important for coreceptor usage, a 
conversion of a negatively charged residue to a positively 
charged residue is sufficient to switch a CCR5 -using to a 
CXCR4-using virus [105,106]. All the sequences we used 
were derived from CCR5 -using viruses (data not shown), 
ruling out any bias caused by differences in coreceptor 
usage. Interestingly Rosen et al showed a very strong co- 
variation between residues 322 and 440 that is influenced 
by charge [107], pointing at a potential functional electro- 
static interaction between these two residues. If and how 
these changes contribute to the development of CrNA 
remains to be further studied. 

We had hypothesized that early HIV-1 variants from 
individuals who later developed CrNA might have an in- 
creased sensitivity to bNAbs. We used multiple bNAbs 
covering four well known target epitopes, namely CD4BS 
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(bl2 and VRC01), gpl20 outer domain (447-52D, 2G12, 
PGT121 and PGT126), quaternary VI V2 (PG9, PG16 and 
PGT145) and MPER (2F5 and 4E10). However, we did not 
observe a significantly increased sensitivity to neutraliza- 
tion by these bNAbs. One could argue that this lack of 
correlation is due to the fact that individuals with CrNA 
are analyzed as a group without considering the epitope 
specificity of their CrNA. Each individual might have 
developed different specificities, blurring the results for 
individual bNAbs. Sera of 14 of our individuals were 
previously analyzed for binding to a panel of gpl20 core 
proteins and their corresponding CD4BS knockout 
mutants and showed the presence of CD4BS directing 
Abs [92], but we could not find a correlation between 
the binding of CD4BS Abs and neutralization sensitivity 
of early clonal virus variants to VRCOL Another explan- 
ation could be that the specificities responsible for 
CrNA in our individuals may be different from those of 
the monoclonal bNAbs we tested. We did observe dif- 
ferences in sensitivity to the polyclonal HIVIg: clonal 
viral variants from individuals who developed CrNA 
showed a trend towards being more sensitive to HIVIg 
might be consistent with a more exposed Env structure. 
In summary, we cannot conclude which bNAb epitopes 
were immunogenic in our study population. 

In a previous study we showed that the kinetics of 
CrNA development coincided with the kinetics of the 
induction of autologous neutralizing response, which 
was directed against viruses isolated early after SC [34]. 
This suggests that the development of CrNA is driven 
by epitopes that are exposed on early viruses [80]. Alter- 
natively, CrNA may be induced by early (or slightly 
later) HIV-1 variants that have already escaped from the 
initial autologous neutralizing response [77]. The latter 
observation is consistent with the hypothesis that HIV-1 
escape variants selected by autologous NAbs, in turn, 
may elicit new NAbs with altered specificities that allow 
for broad reactivity [52]. We studied early Env variants 
from a relatively wide time span (2 to 14 months after 
SC), therefore we cannot rule out or support one of the 
above hypotheses. 

Currently known bNAbs take years to develop and 
show accumulation of many somatic mutations, 
suggesting that their development is driven by years of 
antigen exposure and possibly reflecting the continuous 
co-evolution of HIV-1 and the immune response di- 
rected to it [51,52,80,108,109]. This would suggest a 
more prominent role for viral evolution and diversity in 
the development of bNAbs. Consistent with this, 
greater Env diversity early in infection was associated 
with greater NAb breadth later in infection in a cohort 
of antiretroviral therapy naive Kenyan women that were 
mostly infected with subtype A [110]. In our present 
study, involving therapy naive, subtype B infected MSM 



from the ACS; we did not observe a correlation between 
early Env diversity and the development of CrNA in 
serum (data not shown). 

Conclusions 

In summary, our current results show that sequence and 
structural characteristics of Env from early subtype B 
HIV-1 viruses may be associated with the development 
of CrNA in serum during infection. We observed that 
the presence of a short VI and lower probability of glyco- 
sylation, specifically at position 332, were associated with 
the induction of CrNA. In addition, a number of amino 
acid changes that mostly clustered in VI and C3-V4 corre- 
lated with the development of CrNA. Additional studies 
are needed to further clarify the role of these amino acids 
in the induction of CrNA, but the determinants for CrNA 
induction described here might facilitate the design of 
vaccines aimed at inducing bNAbs. 

Methods 

Individuals and viruses 

Samples studied here were derived from participants of 
the ACS of MSM who were infected with HIV-1 subtype B. 
From all ACS participants, 292 were previously tested for 
the presence of CrNA in their sera at 2-4 years post-SC 
[79]. Sera were tested for CrNA on a panel of 6 heterol- 
ogous viruses from different subtypes [39] and ranked 
based on their geometric mean IC 50 titer and on the num- 
ber of viruses from the panel that were neutralized. For the 
present study we selected cohort participants for whom the 
date of seroconversion was known, who had a follow-up of 
at least 4 years, and were therapy naive at the time of 
screening for CrNA. Moreover, clonal HIV-1 variants had 
to be available. These criteria were fulfilled by 32 individ- 
uals who seroconverted between 1984 and 1996 [35,40,79]. 
One of these individuals was defined as elite neutralizer, 
according to the definition of Simek et al [39] with a geo- 
metric mean IC 50 titer of 782. We excluded this individual 
from our study as it is the subject of other follow-up 
studies, leaving us with 31 individuals fulfilling our 
criteria (Table 1). 

Clonal virus variants were previously obtained from co- 
cultures of peripheral blood mononuclear cells (PBMCs) 
from HIV-1 infected individuals and 3-day phytohem- 
agglutinin (PHA) stimulated PBMCs from healthy donors 
[111,112]. As virus isolation by coculturing PBMCs from 
infected individuals with stimulated healthy donor PBMCs 
can result in selection of a variants that are more fit for 
replication in PBMCs in vitro, we used a protocol in which 
limiting numbers of PBMCs from infected individuals and 
stimulated healthy donor PBMCs were mixed in multiple 
parallel cultures. This allows for the isolation of multiple 
replication competent clonal variants, avoiding the out- 
growth and loss of slowly replicating variants [111,112]. 
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Moreover, to prevent sequence changes during in vitro 
culture, the number of passages in PBMCs was kept to a 
minimum. This method yields in clonal sequences that are 
very similar to sequences derived from single genome 
amplification [113]. The ACS are conducted in accordance 
with the ethical principles set out in the declaration of 
Helsinki, and written consent was obtained prior to data 
collection. The study was approved by the Academic 
Medical Center Institutional Medical Ethics Committee. 

Gp160 sequence analysis 

The HIV env genes from proviral-DNA isolated from 
PBMCs, that were infected in vitro with a single clonal 
HIV-1 variant, were PCR amplified and subsequently 
sequenced as described previously [114-116]. Complete 
gpl60 sequences could be analyzed for 26 individuals, 
whereas only C1-V5 env sequences could be analyzed 
for individuals ACH19566, 19342, 18860, 18839 and 
18829 (Table 1). Nucleotide sequences were aligned 
using ClustalW in the software package of BioEdit [117], 
and edited manually. The reference sequence HXB2 was 
included in the alignment to number each aligned resi- 
due according to the corresponding position in this ref- 
erence. Genetic analyses were performed on gpl60 
sequences starting at nucleotide position 91, thereby ex- 
cluding the Env signal peptide. The total length of the 
gpl60 sequences and the separate regions were calcu- 
lated by counting the number of amino acids. The num- 
ber of PNGS and number of NXT or NXS motifs were 
identified using N-Glycosite [118] at the Los Alamos 
HIV database website (http://www.hiv.lanl.gov/content/ 
sequence/GLYCOSITE/glycosite.html). Sequences with 
double PNGS were counted by N-Glycosite as followed: 
NNSS as +1 NXS motif, NNTT as +1 NXT sequon, and 
NN[TS][ST] as +1 NXT motif. A proline at the second 
position (site pattern NP[ST]) is strongly disfavored for 
glycosylation and therefore excluded as a PNGS. Net 
electrostatic charges of gpl60 were calculated by counting 
all charged amino acid residues per sequence, where 
residues R and K counted as +1, H as +0.293, and D and E 
as -1. Intra-individual genetic diversity of the complete 
Env sequences generated from the earliest time point were 
analyzed for 23 individuals, using the Kimura-2 parameter 
model of evolution in the MEGA 4.1 software package 
(http://www.megasoftware.net). 

Phylogenetic analyses 

A Maximum Likelihood (ML) tree was constructed with 
complete HIV-1 env gpl60 sequences from 26 individ- 
uals with diverse levels of CrNA. The best-fit nucleotide 
substitution model (GTR+I+G), selected by hierarchical 
likelihood ratio test (hLTRs, Model Test 3.7 [119] was 
implemented in the heuristic search for the best ML tree 
applying TBR branch-swapping algorithm using PAUP*4.0 



[120], starting with a Neighbor- Joining (NJ) tree construc- 
ted under the Hasegawa-Kishino-Yano (HKY85) model 
of evolution [121]. The robustness of the NJ phylogeny 
was assessed by bootstrap analysis with 1,000 rounds of 
replication. 

Neutralization assay 

Neutralization sensitivity to known bNAbs and poly- 
clonal HIVIg pools was tested in a PBMC based assay, 
for a minimum of one and a maximum of five clonal 
virus variants per individual. PBMCs were isolated from 
buffy coats obtained from healthy seronegative blood 
donors and cultured as described previously [112]. The 
neutralization sensitivity was tested for the twelve indi- 
viduals who developed CrNA and the nine individuals 
who did not develop CrNA. The individuals who devel- 
oped CrNA were included in this group following the 
criteria of having a geometric mean IC 50 titer > 90 in 
serum and the ability to neutralize > 5 viruses from the 
panel of six heterologous viruses in a pseudovirus assay 
developed by Monogram Biosciences. The panel 
consisted of six pseudoviruses with envelope sequences 
from primary isolates of HIV-1 subtypes A (94UG103), 
B (92BR020 and JRCSF), C (93IN905 and MGRM-C-026) 
and CRF01_AE (92TH021) [35]. Individuals of whom the 
sera had a geometric mean IC 50 titer < 45 in serum and 
neutralized < 2 viruses from the panel of 6 heterologous 
viruses were included in the group that did not develop 
CrNA (Table 1). In total eleven broadly reactive neutraliz- 
ing monoclonal antibodies (bNAbs) were tested; IgGl bl2 
(kindly provided by NIBSC; EVA3065), 2G12, 2F5 and 
4E10 (kindly provided by NIBSC; ARP3277, EVA3063, 
ARP3239), PG9 and PG16 (AIDS reagent program #12149 
and 12150), VRC01 (AIDS reagent program #12033), 447- 
52D (kindly provided by NIBSC; ARP3219)) and PGT121, 
PGT126 and PGT145 (kindly provided by IAVI NAC re- 
pository); and three pools of polyclonal HIVIg sera: pool 1 
(AIDS reagent program #3957 lot. nr. 12-100158), pool 2 
(AIDS reagent program #3957 lot. nr. 14-120074), and 
pool 3 (AIDS reagent program #3957 lot. nr. 11-098130; 
NABI lot. nr. IHV-50-111 and lot. nr. IHV-250-114 [122]). 
HIVIg is a pool of concentrated antibodies from the blood 
from HIV-positive asymptomatic persons with high levels 
of HIV-1 specific antibodies. 

From each virus isolate, a final inoculum of 20 50% tis- 
sue culture infective doses (TCID50) was incubated for 1 
h at 37°C with each specific monoclonal antibody in three- 
fold serial dilution. The starting dilution was 25 (ig/ml for 
bl2, 2G12, 2F5, 4E10 and 447-52D, and 5 ug/ml for PG9, 
PG16, VRC01, PGT121, PGT126 and PGT145. To test the 
neutralization sensitivity of the virus clones against HIVIg 
a final inoculum of 20 TCID50 was incubated for 1 h at 
37°C in threefold serial dilution with a starting dilution of 
1500 ug/ml. Subsequently, 10 5 PHA-stimulated PBMCs, 
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derived from healthy blood donors, were added to the 
mixture. After incubation with HIVIg an additional wash- 
ing step was performed with phosphate-buffered saline 
after 4 h of incubation at 37°C. On days 7 and 11, virus 
production in culture supernatants was analyzed with a 
p24 antigen capture enzyme-linked immunosorbent assay 
[123]. The percent neutralization was calculated by deter- 
mining the reduction in p24 production in the presence of 
the bNAbs or HIVIg compared to the cultures with virus 
only. When possible, IC 50 s were determined by linear 
regression. When the analyses required the assignment of 
one IC 50 value per individual, we took the median of the 
IC 5 o values of the individual virus clones. For calculations, 
viruses with IC 50 values below the lowest dilution or above 
the highest dilution were assigned the lowest or highest 
dilution value, respectively. 

Statistical analyses 

Statistical analyses on the gpl60 sequence and the 
neutralization data were performed using Graphpad 
Prism v5.01. Differences between sequences (length, 
number of PNGS, NXS or NXT motifs and net electro- 
static charge) were compared using a Spearman correlation 
test, with the geometric mean IC 50 titer across the six 
heterologous viral panel as the absolute x-value. Differences 
and correlations were considered statistically significant 
when P values were < 0.05. In cases of normal distribution 
of the data we used the unpaired t-test (for example to 
compare the mean neutralization titers for the eleven 
known bNAbs and polyclonal HIVIg pools between 
individuals that did or did not develop CrNA). When 
neutralization titers were not distributed normally, a 
Mann- Whitney test was performed to compare the two 
groups. For each individual, the median IC 50 values of 
each bNAb or HIVIg were used. Difference in NIS vs. 
NLS glycosylation at position 332 and its correlation 
with the presence of CrNA was compared using a 
Fischer s exact test. 

Multiple sequence alignment for Sequence Harmony 

A multiple sequence alignment was performed on 91 Env 
sequences from 21 individuals, starting at nucleotide 
position 91, which excludes the Env signal peptide, with a 
minimum of one and a maximum of eleven sequences per 
individual. In total we obtained 58 sequences from twelve 
individuals who developed CrNA and 33 sequences from 
nine individuals that did not develop CrNA (Table 1). In 
order to create a Multiple Sequence Alignment of suffi- 
cient consistency in the variable regions, three of the most 
widely used tools, Praline, Muscle and Clustal [124-126], 
were used with a range of settings (different Pam/Blosum 
matrices, different gap open and gap extend penalties) and 
evaluated. None gave completely satisfactory results, so 
the final alignment was based on Praline with global pre- 



profiling and default settings. The variable regions were 
extensively adjusted manually using the Jalview alignment 
viewer and editor [126], by optimizing the alignment of 
sequence patterns such as the NXS/T glycosylation sites 
and guided by the conservation scores reported by Jalview, 
which are based on the conservation of physicochemical 
properties, without penalizing the introduction or elong- 
ation of gaps (Additional file 7: Figure S4). 

Comparison of viral sequences with Sequence Harmony 

The Sequence Harmony (SH) method ([90] and www.ibi. 
vu.nl/programs/seqharmwww) was used to analyze amino 
acid differences between the env sequences of the twelve 
individuals who developed CrNA and the nine individuals 
who did not develop CrNA (Table 2). The SH algorithm is 
an entropy-based method, which detects positions within 
an alignment that display compositional differences in re- 
lated protein sequences divided in two groups, and might 
therefore be linked to functional differences. In addition, 
an empirical Z-score is calculated, reflecting the signifi- 
cance of the SH-score obtained based on 100 random shuf- 
fling events of the sequences between the two groups. For 
details see reference [86] and the online documentation on 
the web server (www.ibi.vu.nl/programs/seqharmwww). SH 
measures the overlap in distribution of amino acid types 
between two subgroups (A and B; in this case env se- 
quences obtained from individuals who developed CrNA 
and individuals who did not) at a certain position (/) in the 
sequence alignment as follows: 



SHf B = lpt log 



where pf x indicates the observed frequency in group A for 
amino acid type x at position i in the sequence, and pf x 
analogously for amino acid frequencies observed in 
group B sequences. The final SH score is calculated by 
SHi = Y2 ( SHf /B + SHf /A ). Therefore, an SH score of 0 
indicates amino acid positions that are specific for one 
of the sequence groups, whereas an SH score of 1 indi- 
cates a complete overlap at this amino acid position 
between the two groups. In our dataset, the number of 
sequences included per individual varies from 1 to 11, 
which makes the analysis biased towards individuals 
with larger numbers of sequences included. To adjust for 
this bias we extended the SH method by including a 
weight for each sequence. We assigned a weight w=l/N p 
to each sequence as the inverse of the number of se- 
quences N p included for individual p. Each individual 
will therefore have the same impact on the total score in 
each group. Cut-off scores were set as SH<0.7 for the 
residues in the variable regions and <0.85 for residues 
in the conserved regions, i.e. a less strict selection in the 
conserved region to allow also small(er) differences to 
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be detected. The lower (negative) the z-score, the less 
likely that the results were found by chance. 

Structural analysis 

SH yielded a list of amino acid differences in Env between 
individuals who did developed CrNA and who did not, 
but structural analysis informs us on the location of these 
positions in the protein structure. A structure of the 
complete Env trimer is not available. In order to assist the 
inspection of the positions that were different between 
the two groups as identified by SH, we built a model 
that contained all gpl20 domains. We started with the 
gpl20 structure from 3JWD [104], because it contains 
the N- and C-termini of gpl20. To this structure we 
added the VI V2 loops from 3U4E [64] and using 1GCG 
[127] to obtain an overlap with the 3 J WD structure; as 
well as the V3 and V4 regions from 2B4C [128]. These 
structures were overlaid using the super' and 'pair_fit' 
functions of PyMol (The PyMOL Molecular Graphics 
System, Version 1.5.0.4 Schrodinger, LLC). No additional 
loop modeling was performed, so the actual conformation 
of the VI V2, V3 and the V4 regions as presented should 
be considered an arbitrary visualization of the residues 
identified by SH. Note that the glycans are absent from 
this model. The sites selected by the SH analysis (i.e., the 
residues scoring below the cut-off, as described above) 
were mapped on the three-dimensional structure of the 
resulting model using PyMol. 

Availability of supporting data 

The sequences supporting the results of this article are 
available in the GenBank repository [Genbank: EU743974- 
743976; EU743978-743979; HQ644487 1-644872; JF910158- 
910162; JF910166; JF910176-910178; JF910181-910183; 
JF910188-910190, http://www.ncbi.nlm.nih.gov/genbank]. 

Additional files 



who did not develop CrNA, for details see Methods. The SH algorithm is 
an entropy-based method, which detects positions within an alignment 
that display compositional differences in related protein sequences 
divided in two groups, and might therefore be linked to functional 
differences. In addition, an empirical Z-score is calculated, reflecting the 
significance of the SH-score obtained based on 100 random shuffling 
events of the sequences between the two groups. Cut-off scores were 
set as SH<0.7 for the residues in the variable regions and <0.85 for 
residues in the conserved regions, i.e. a less strict selection in the 
conserved region to allow also small(er) differences to be detected. The 
lower (negative) the z-score, the less likely that the results were found by 
chance. 

Additional file 4: Figure S3. Title of data: Multiple sequence alignment 
of gp160 sequences from CrNA and non- CrNA individuals used for SH 
analysis. Description of data: Multiple sequence alignment of 91 
complete gp160 sequences from 21 individuals, starting at nucleotide 
position 91, excluding the Env signal peptide, with a minimum of one 
and a maximum of eleven sequences per individual. In total 58 
sequences from twelve individuals who developed CrNA and 33 
sequences from nine individuals that did not develop CrNA, CrNA and 
non-CrNA respectively, are depicted. 

Additional file 5: Table S2. Title of data: Subclustering of positions 
identified by SH. Description of data: Six sites are isolated in the structure 
(and sequence) and form a cluster of their own. Three small clusters are 
purely sequential. Another sequential cluster (T138, N139, T140, N141, 
S142) is not present in any of the crystal structures analyzed, but aligns 
with a similar region (T138^T135, N139^I136, N141^N137 in 3U4E) 
which makes it cluster with another residue (K151 at 4A). Finally, there 
are four other clusters containing sequentially distant residues with on 
average slightly more than four residues per cluster. In one case this 
clustering is tight, with a distance around 3 A, the three other distances 
involved are around 6-7 A. One of these clusters consists of three 
sequential parts and involves one tight and one looser distance. Most 
clusters are restricted to within one of the V or C regions, except for the 
one that connects C1 and C2, and the two others that bridge C3 and V4. 
Almost all residues selected within the V4 region are in close contact 
with the ct2-helix (336-353) of the C3 region. 

Additional file 6: Table S3. Title of data: Median neutralization IC 50 
titer per individual per bNAb. Description of data: Median neutralization 
IC 50 titer of the primary HIV-1 variants from 21 selected individuals to 
neutralization by eleven different bNAbs covering the four known 
epitope clusters, namely CD4BS (b>12 and VRC01), gp1 20 outer domain 
(447-52D, 2G12, PGT121 and PGT126), quaternary V1V2 (PG9, PG16 and 
PGT145) and MPER (2F5 and 4E10), and to the three polyclonal HIVIg 
pools. Interestingly, early HIV-1 variants of individuals who developed 
CrNA were more sensitive to neutralization by polyclonal HIVIg pool 1 
compared to early viruses from individuals who did not develop CrNA 
(p = 0.037), although this was found for the other two pools tested. 

Additional file 7: Figure S4. Title of data: Correlation of SH-scores and 
Z-scores. Description of data: (A) Correlation of the SH scores as found by 
Sequence Harmony; (B) Correlation of the Z-scores found by Sequence 
Harmony. Each amino acid difference in Env between individuals who 
did or did not developed CrNA is represented by one dot. The value on 
the x-axis represents scores found when using 91 Env sequences as 
described in Materials, and the value on the y-axis represents the scores 
found when using the consensus sequences of the 12 individuals who 
did and the 9 individuals who did not developed CrNA. Statistical 
analyses on the SH-scores and the Z-scores were performed using 
Graphpad Prism v5.01. Correlations between the two methods were 
compared using a Pearson correlation test. Differences and correlations 
were considered statistically significant when P values were < 0.05. 



Abbreviations 

HIV-1: Human immunodeficiency virus type-1; AIDS: Acquired 
immunodeficiency syndrome; CrNA: Cross-reactive neutralizing activity; 
NAbs: Neutralizing antibodies; Env: Envelope glycoprotein complex; 
PNGS: Potential N-linked glycosylation sites; post-SC: Post-seroconversion; 
CD4BS: CD4-binding site; bNAbs: Broadly reactive neutralizing monoclonal 
antibodies; (SH): Sequence Harmony; ACS: Amsterdam Cohort Studies on HIV 



Additional file 1: Figure SI. Title of data: Baseline characteristics for 
the individuals with non-CrNA, intermediate CrNA and CrNA. Description 
of the data: (A) time, in months, between SC and CrNA level measured 
by Monogram Biosciences; (B) time, in months, between SC and isolation 
of clonal viral variants; (C) CD4 + T cells/ul blood at set-point; (D) viral 
load, in log 10, at set-point. Individuals with non-CrNA, intermediate CrNA 
and CrNA in their serum are represented by circles, squares and triangles, 
respectively. Each individual is represented by one symbol. 

Additional file 2: Figure S2. Title of data: Genetic relationships 
between viruses from individuals with diverse levels of CrNA in serum. 
Description of data: Complete gp160 sequences derived from 26 
individuals with varying levels of CrNA in serum were aligned and a ML 
tree was constructed. Individuals with high, intermediate and low CrNA 
are indicated in red, blue or green, respectively. 

Additional file 3: Table SI. Title of data: Sequence Harmony results 
with consensus sequences. Description of data: The Sequence Harmony 
(SH) method ([90] and www.ibi.vu.nl/programs/seqharmwww) was used 
to analyze amino acid differences between the consensus env sequences 
of the twelve individuals who developed CrNA and the nine individuals 



van den Kerkhof et al. Retrovirology 2013, 10:102 
http://www.retrovirology.eom/content/1 0/1 /1 02 



Page 16 of 19 



infection and AIDS; MSM: Men who have sex with men; PBMCs: Peripheral 
blood mononuclear cells; PHA: Phytohemagglutinin; ML: Maximum 
Likelihood; NJ: Neighbor-Joining; TCID50: 50% tissue culture infective doses. 
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